Grammatical word class variation within the British National Corpus Sampler
نویسندگان
چکیده
This paper examines the relationship between part-of-speech frequencies and text typology in the British National Corpus Sampler. Four pairwise comparisons of part-of-speech frequencies were made: written language vs. spoken language; informative writing vs. imaginative writing; conversational speech vs. ‘task-oriented’ speech; and imaginative writing vs. ‘task-oriented’ speech. The following variation gradient was hypothesized: conversation – task-oriented speech – imaginative writing – informative writing; however, the actual progression was: conversation – imaginative writing – task-oriented speech – informative writing. It thus seems that genre and medium interact in a more complex way than originally hypothesized. However, this conclusion has been made on the basis of broad, pre-existing text types within the BNC, and, in future, the internal structure of these text types may need to be addressed.
منابع مشابه
Unsupervised All-words Word Sense Disambiguation with Grammatical Dependencies
We present experiments that analyze the necessity of using a highly interconnected word/sense graph for unsupervised allwords word sense disambiguation. We show that allowing only grammatically related words to influence each other’s senses leads to disambiguation results on a par with the best graph-based systems, while greatly reducing the computation load. We also compare two methods for com...
متن کاملUsing the BNC to produce
This paper describes an attempt to generate seemingly meaningful cryptic crossword clues without trying to analyse meaning but relying solely on word occurrence statistics. It is a continuation of a project in which I developed an application toolkit for cryptic crossword clue compilers. The software described here assembles simple cryptic clues using the resources developed in the earlier proj...
متن کاملInvestigating the collocational behaviour of MAN and WOMAN in the BNC using Sketch Engine
In this paper, I examine the representation of men and women in the British National Corpus (BNC) by focussing on the collocational and grammatical behaviour of the noun lemmas MAN and WOMAN (i.e., the nouns man/men and woman/women). Using Sketch Engine (a powerful corpus query tool, which is described) I explore the functional distribution of the target lemmas, and reveal the structured and sy...
متن کاملLinggle Knows: A Search Engine Tells How People Write
This paper presents Linggle Knows, an English grammar and linguistic search engine. Linggle Knows help people writing by displaying lexical and grammatical information extracted from a couple of large scale corpora, including Google Web 1T 5-gram, British National Corpus (BNC), New York Times Annotated Corpus (NYT), etc. It not only describes how a word is genuinely used, but also recommends va...
متن کاملClaws4: The Tagging Of The British National Corpus
The main purpose of this paper is to describe the CLAWS4 general-purpose grammatical tagger, used for the tagging of the 100-million-word British National Corpus, of which c.70 million words have been tagged at the time of writing (April 1994)) We will emphasise the goals of (a) gener~d-purpose adaptability, (b) incorporation of linguistic knowledge to improve quality ,and consistency, and (c) ...
متن کامل